Concerning the NJ algorithm and its unweighted version, UNJ

نویسنده

  • Olivier Gascuel
چکیده

In this paper we will present UNJ, an unweighted version of the NJ algorithm (Saitou and Nei 1987; Studier and Keppler 1988). We will demonstrate that UNJ is well suited when the data are of the ( ) ( ) δ ε ij ij ij d = + type, where ( ) dij is a tree distance, and when the εij are independent and identically distributed noise variables. Simulations confirm this theory. On a more general level, we will study the three main components of the agglomerative approach, applied to the reconstruction of tree distances. (i) We will demonstrate that the selection criterion for the pair to be agglomerated, used by NJ and UNJ, retains its meaning whatever the variances and covariances of the δij estimates. We will also provide a new proof of the correction of this criterion, based on an interpretation in acentrality terms proposed by Mirkin (1996). (ii) Using the results of Vach (1989), of which we will provide a simple new demonstration, we propose an analytical formula which enables the correct least-squares estimation of edge lengths in ( ) O n time, where n is the number of objects. (iii) We will provide a class of admissible reduction formulae which guarantee the finding of the true tree with additive data. We propose to choose, among these formulae, the minimum variance reduction, so that at each step we use estimates which are as reliable as possible in choosing the pair to be agglomerated. We will present the general solution, and apply it to the particular data model retained here.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sum-Max Graph Partitioning Problem

This paper tackles the following problem: given a connected graph G = (V,E) with a weight function on its edges and an integer k ≤ |V |, find a partition of V into k clusters such that the sum (over all pairs of clusters) of the heaviest edges between the clusters is minimized. We first prove that this problem (and even the unweighted variant) cannot be approximated within a factor of O(n1− ) u...

متن کامل

ALGORITHMS FOR MINIMUM EVOLUTION August 15, 2002 Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle

The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) tting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix, and then do a topological search from that starting point. The rst stage...

متن کامل

Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle

The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first st...

متن کامل

ALGORITHMS FOR MINIMUM EVOLUTION August 26, 2002 Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle

The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) tting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix, and then do a topological search from that starting point. The rst stage...

متن کامل

On-Line Load Banancing in a Hierarchical Server Topology

In a hierarchical server environment jobs are to be assigned in an on-line fashion to a collection of servers which form a hierarchy of capability: each job requests a specific server meeting its needs, but the system is free to assign it either to that server or to any other server higher in the hierarchy. Each job carries a certain load, which it imparts to the server it is assigned to. The g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996